Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning

نویسندگان

  • Seong-Bae Park
  • Byoung-Tak Zhang
چکیده

This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus, the proposed method is primarily based on the rules, and then the residual errors are corrected by adopting a memory-based machine learning method. Since the memory-based learning is an efficient method to handle exceptions in natural language processing, it is good at checking whether the estimates are exceptional cases of the rules and revising them. An evaluation of the method yields the improvement in F-score over the rules or various machine learning methods alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Chunking Using Entropy Guided Transformation Learning

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...

متن کامل

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation

This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative languages with productive inflectional and derivational morphological phenomena. In certain respects, our approach has been motivated by Brill's recent work (Brill, 1995b), but with the observation that his transformational approach is not ...

متن کامل

Memory-Based Acquisition of Argument Structures and its Application to Implicit Role Detection

We propose a generic, memory-based approach for the detection of implicit semantic roles. While state-of-the-art methods for this task combine hand-crafted rules with specialized and costly lexical resources, our models use large corpora with automated annotations for explicit semantic roles only to capture the distribution of predicates and their associated roles. We show that memory-based lea...

متن کامل

Minimally Supervised Japanese Named Entity Recognition: Resources and Evaluation

Approaches to named entity recognition that rely on hand-crafted rules and/or supervised learning techniques have limitations in terms of their portability into new domains as well as in the robustness over time. For the purpose of overcoming those limitations, this paper evaluates named entity chunking and classi cation techniques in Japanese named entity recognition in the context of minimall...

متن کامل

A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking

Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combinati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003